-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/MgrStandby: respawn when deactivated #15557
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- It is ugly to unwind all of the Mgr state so that we can reactivate later. - It is perhaps impossible to do shut down the python state reliably. - Respawning provides a clean state and is reliable. This mostly just copies MDSServer::respawn(). Fixes: http://tracker.ceph.com/issues/19595 Fixes: http://tracker.ceph.com/issues/19549 Signed-off-by: Sage Weil <sage@redhat.com>
I just hit another Mgr-shutdown bug in my last run: 2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr:src/tcmalloc.cc:278] Attempt to free invalid pointer 0x1f 2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr:*** Caught signal (Aborted) ** 2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr: in thread 7f8da56cf700 thread_name:fn_anonymous 2017-06-07T18:29:47.047 INFO:tasks.ceph.mgr.x.smithi116.stderr: ceph version 12.0.2-2485-gc8340cd (c8340cde85674f8d9506d602368c2fd9a6307580) luminous (dev) 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 1: (()+0x393172) [0x56490d9f6172] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 2: (()+0x113e0) [0x7f8dacb8f3e0] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 3: (gsignal()+0x38) [0x7f8dabb20428] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 4: (abort()+0x16a) [0x7f8dabb2202a] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem)+0x22e) [0x7f8dad7625ce] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 6: (()+0x1375f) [0x7f8dad75675f] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 7: (operator delete[](void*)+0x1fd) [0x7f8dad77966d] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 8: (std::_Rb_tree, std::allocator > >, std::pair, std::allocator > >, std::_Identity, std::allocator > > >, std::less, std::allocator > > >, std::allocator, std::allocator > > > >::erase(std::pair, std::allocator > > const&)+0x63) [0x56490d903723] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 9: (MetadataUpdate::finish(int)+0x43) [0x56490d905fb3] 2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 10: (Context::complete(int)+0x9) [0x56490d8cab79] 2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 11: (Finisher::finisher_thread_entry()+0x460) [0x56490da35480] 2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 12: (()+0x770a) [0x7f8dacb8570a] 2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 13: (clone()+0x6d) [0x7f8dabbf182d] 2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr:2017-06-07 18:29:47.048186 7f8da56cf700 -1 *** Caught signal (Aborted) ** 2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: in thread 7f8da56cf700 thread_name:fn_anonymous but fixing these feels like a waste of time. |
tests look okay... |
jcsp
approved these changes
Jun 8, 2017
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine, I was wondering if we could avoid the copy-paste by putting the respawn bit somewhere common but it's hardly essential.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
later.
This mostly just copies MDSServer::respawn().
Fixes: http://tracker.ceph.com/issues/19595
Fixes: http://tracker.ceph.com/issues/19549
Signed-off-by: Sage Weil sage@redhat.com